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^  EDF  Statistics  For  Testing  For  The  Gamma 

i 

Distribution,  With  Applications. 

By 

A.N.  Pettltt  and  M.A.  Stephens 


1.  INTRODUCTION 

"  — -N  Several  years  ago  the  authors  found  percentage  points  for  EDF  statis¬ 
tics  for  the  Gamma  distribution  with  known  shape  parameter  but  unknown 
scale  parameter  (the  origin  of  the  distribution  was  assumed  to  be  zero)."' 
These  were  issued  as  two  Technical  Reports  (Pettitt,  1975-,  Stephens 197 5) . 

^The  gamma  or  equivalently  the  chi-squared  distribution  often  occurs 
as  a  possible  model  for  observations  or  as  the  distribution  of  some 
derived  statistics.  For  example,  in  lifetime  or  survival  studies  the 
gamma  distribution  can  be  proposed  as  the  distribution  of  lifetime  or 
some  function  of  lifetime.  The  chi-squared  distribution  occurs  as  the 
distribution  for  sums  of  squares  in  ANOVA  tables  and  also  in  time  series 
analysis,  in  the  study  of  perlodograms ,  and  in  multivariate  analysis,  as 
the  distribution  of  squared  radii. T)  A  review  of  possible  uses  of  the  gamma 
distribution  is  given  by  Johnson  and  Kotz  (1970,  $17). 

^Dispite  the  many  uses  of  the  gamma  distribution,  the  application  of 

the  tables  might  seem  to  be  limited,  because  the  shape  parameter  must  be 

MS  , 

known.  In  the  earlier  reports,  we/ gave  an  illustration  of  the  use  of  EDF 
statistics  to  test  that  the  sample  variances  of  cells  of  an  ANOVA  table, 
with  the  same  number  of  observations  in  each  cell,  all  came  from  the  same 
distribution.  Since  then,  other  applications  have  come  to  light  and  these 
will  be  presented  in  this  report;  also  the  test  procedures  and  the  tables 
will  be  reproduced  to  make  the  report  complete  in  itself. 


1 


2.  TEST  PROCEDURES . 


2.1.  The  Gamma  Distribution. 

The  null  hypothesis  to  be  tested  is 

Hq:  a  random  sample  x^,...,xn  , 

with  distribution  function  F(x),  comes  from  the  gamma  population  with 
known  shape  parameter  m  but  unknown  scale  parameter  6.  The  null 
hypothesis  is  therefore 

(1)  Hq:  F(x)  -  G(0x,m)  , 

where  G(z,m)  is  the  gamma  distribution  function  given  by 

ft  m-1  -t 

■  J0  Htst  dt  • 

When  the  parameter  6  is  known  then  the  null  hypothesis  Hq  can 
be  tested  using  a  whole  variety  of  statistics,  since  the  probability 

integral  transformation,  u^  -  G(x^,m),  produces  u^ . uq  which 

behave  like  a  random  sample  from  the  uniform  (0,1)  distribution  when 
Hq  is  true.  Pearson  and  Hartley  (1972,  pll7)  consider  some  suitable 
statistics. 

Whan  the  parameter  6  is  unspecified  and  has  to  be  estimated  from 
the  sample  than  the  problem  of  tasting  Hq  becomes  more  complicated. 
Statistics  which  are  both  useful  in  practice  and  analytically  tractable 
are  EDF  statistics,  based  on  the  empirical  distribution  function.  In 
particular  the  weighted  Cr amir -von  Mlses  statistics  have  been  found  to 
have  good  power  properties. 
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A  Cramer-von  Mises  type  statistic  is  defined  in  general  terms  by 

v*-n  f  (Fn(x) -F  (x;0)}2  y{F  (x;0)}  dFQ(x;0)  , 

where  F^(x;0)  is  the  distribution  function  specified  by  the  goodness- 
of-fit  null  hypothesis,  and  in  this  particular  case  Fq(x;0)  *  G(0x,m). 
The  function  ¥(  • )  is  a  positive  weight  function.  Other  EDF  statistics 
such  as  the  Kolmogorov-Smirnov  type  statistics  tend  to  be  less  powerful 
than  the  Cramer-von  Mises  type  statistics  (see,  for  example,  Stephens, 

1974,  §5).  In  this  report  we  give  the  asymptotic  distributions  of  three 

2  2 
well  known  statistics,  W  ,  the  Cramer-von  Mises  statistic,  A  ,  the 

o 

Anderson-Darling  statistic,  and  U  ,  the  Watson  statistic,  when  6  is 
replaced  by  its  maximum  likelihood  estimate  0  «  m/x,  where  x  *  Ex^/n. 

For  increasing  sample  size  n,  percentage  points  of  the  test 
statistics  converge  rapidly  to  the  asymptotic  percentage  points.  A 
very  small  modification  to  the  value  of  the  test  statistic  then  makes 
it  possible  to  use  only  the  asymptotic  points  when  making  the  test.  The 
modifications  are  on  the  lines  of  those  given  in  Stephens  (1974)  and  in 
Pearson  and  Hartley  (1972,  Table  54). 

2.2.  Test  Procedure. 

For  a  set  of  observations  x,  we  give  here  the  procedure  to  be 
followed,  to  test  Hq  given  in  (1). 

(a)  Put  the  observations  in  ascending  order,  <.  *2  —  ***  —  xn» 
let  x  ■  EXj/n. 

A  .  A 

(b)  Use  6  ■  m/x  to  calculate  z^  ■  Ox^,  i  ■  l,...,n. 

(c)  Find  w4  -  G(zltm)  for  i  ■  l,...,n,  and  the  mean  w  ■  EWj/n. 

(d)  Calculate 


fi-  (t?)}2 + ih  *  °2  - y2  -  «<5  -  !>2  • 


A£  *  -  ~  l  (2i-l)  {An  w±+  An(l-wQ+1_i)}  -  n 


(e)  Table  1  gives  asymptotic  percentage  points  for  the  test  statistics. 
In  order  to  use  these  with  a  sample  of  size  n,  we  calculate  modified 

i t  *  * 

statistics  W  ,  U  and  A  ,  as  follows.  For  m  -  1,  calculate 
W*  -  W2(l+0.16/n),  U*  -  U2(l+0.16/n)  or  A*  -  A2(l+0.6/n);  for 
m  >  1,  calculate 


*  ^  (1.8nW  -  0.14) 

*  1.8n-l 


*  ( 1 . 8nU  -  0.14) 

J  "  1.8n-l 


A*  -  A2  +  J  (0.2  +  —■)  . 
n  n 


The  modified  statistics  are  then  referred  to  the  upper  tall  asymptotic 
points  to  make  the  teat;  If  a  modified  statistic  exceeds  the  appropriate 
table  entry  In  Table  1,  for  the  statistic  used  and  for  given  m  and 
percentage  level  a,  reject  Hq  at  significance  level  a. 

2 

Illustration.  Suppose  a  sample  of  slse  n  ■  10  gives  W  -  0.170, 
U2  »  0.1555  and  A2  -  1.102,  In  a  test  for  the  gamma  distribution  with 
m  •  4;  from  the  formula  above  W*  ■  (18(0.170)  -  0.14)/17  •  0.172; 


similarly  U*  -  (18(0.155)  -  0.14/17  -  0.156,  and  A*  -  1.102+  0.1(.275) 

■  1.130.  Comparison  with  the  asymptotic  points  shows  that  W*  is  signifi¬ 
cant  at  approximately  the  5.8%  level,  U*  at  the  5.3%  level,  and  A*  at 
the  5.5%  level. 

For  the  different  statistics,  extensive  Monte  Carlo  studies  were 
made  to  give  the  percentage  points  for  n  *  5,8,10,20  and  50.  These 
were  plotted  against  1/n  and  smoothed,  and  the  modifications  transform 
the  percentage  point  for  finite  n  to  a  value  which  closely  approximates 
the  asymptotic  value  for  the  same  a.  They  have  been  devised  to  be  inde¬ 
pendent  of  a,  and,  as  much  as  possible,  to  be  independent  also  of  m; 
thus  extensive  tables  of  percentage  points,  for  each  combination  of  n, 
m  and  a,  have  been  condensed  into  Table  1.  If  the  true  percentage  level 
is  a',  the  modifications  give  an  error  |ot-a'|  less  than  0.5%  for 
n>  5,  and  for  n  _>  8  the  accuracy  will  usually  be  much  higher. 


3.  APPLICATIONS. 

3.1.  Application  to  Tests  for  Multivariate  Normality. 

Suppose  y  is  a  vector  of  p  observations,  and  it  is  wished 
to  test  Hq!  a  random  sample  *  ’^n  comes  fro®  a  multivariate 

normal  distribution  with  mean  p  and  covariance  matrix  I .  The 
quadratic  forms  r^  ■  (y^-p) 'E^Cy^-p) ,  i  ”  l,...,n  will,  on  H^, 
be  distributed  as  Xp»  and  so  can  be  tested  to  come  from  G(r,m), 
where  m  *  p/2.  This  idea,  and  others  related  to  it,  appears  frequently 
in  the  literature.  In  such  a  case,  all  parameters  are  specified,  and 
EDF  statistics  could  be  usad  with  Case  0  (Stephens,  1974).  However, 
it  might  be  a  more  robust  procedure  to  test  Instead  for  G(6r,m),  with 
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6  to  be  estimated;  this  concentrates  on  the  shape  of  the  quadratic  form 
even  if  there  is  a  misspecif ication  of  the  parameters,  particularly  of  E. 
Improvement  in  power  for  tests  of  distributional  form,  when  an  estimate 
of  a  parameter  is  used  even  though  its  value,  on  HQ,  should  be  known, 
have  already  been  noted  by  Stephens  (1974)  and  Dyer  (1974) ,  in  the  context 
of  tests  of  normality  for  univariate  data. 

When  the  y  and  I  of  the  quadratic  forms  r^  are  replaced  by 
their  sample  values  y  and  S  **  n  ^  E(y^-y) (y^-y) ' •  giving 

■  (y±-y) '  S  1(yi~y),  then  each  r ^  has  an  approximate  marginal 

2 

Xp  distribution,  see,  for  example,  Gnanadesikan  (1979,  55.4).  The 
r^  have  the  additional  property  that  Er^  **  n  so  that  the  null  distri¬ 
bution  of  the  r ^ ' s  is  approximately  the  same  as  that  of  the  of 
section  2.2  and  tests  of  can  be  made  using  the  procedure  of  section 

2.2  with  r^s  replacing  z^s,  and  m  -  p/2. 


3.2.  Time  Series  Applications. 

A  variation  of  Bartlett's  test  (see  Cox  and  Lewis,  1966,  56.4)  for 
the  Independence  of  Intervals  in  a  renewal  process  can  be  devised  using 
the  procedure  of  section  2.2.  Bartlett's  test  involves  the  smoothing 
of  m  adjacent  periodogram  coordinates  and  then  dividing  by  the  spectral 
density  function,  which  is  assumed  known  up  to  some  unknown  scale  factor. 
This  gives  rise  to  scaled  periodogram  values,  x^,...,xq,  which  have  an 
approximate  G(9x,m)  distribution  function,  when  the  correct  spectral 
density  function  is  used.  The  unknown  scale  parameter  6  can  be  estimated 
using  the  technique  of  section  2.2  and  the  z^'s  found.  A  test  of  the 
correctly  specified  spectral  density  function  can  be  made  using  the  proce¬ 
dure  of  section  2.2  for  various  values  of  m. 


Another  application  follows  ideas  mentioned  in  Cox  and  Lewis  (1966, 
§3.2).  In  a  Poisson  process,  suppose  we  consider  the  times  between  the 
(j-l)m-th  and  jm-th  events,  j  ■  1,2,...  .  for  mixed  m.  Although  it 
might  not  be  reasonable  to  assume  that  the  between  a  events  times 
are  independent  and  have  the  G(6x,m)  distribution  function  for  say 
m  *>  1  or  2,  it  might  be  reasonable  to  believe  that  the  assumption 
is  true  for  m  ■  4,5,...  .  A  test  of  this  hypothesis  can  be  made  by 
taking  xi»x2,**‘  as  the  times  between  the  origin  and  m-th  event, 
between  the  m-th  event  and  the  2m- th  event,  and  so  on,  and  then  following 
section  2.2. 

3.3.  Survival  Time  Analysis. 

In  survival  data  analysis  many  models  have  been  proposed  for  the 
distribution  of  survival  time;  see,  for  example,  Kalbfleisch  and  Prentice 
(1980,  §3) .  One  model  is  to  assume  that  the  survival  time  t  has  a 
distribution  so  that  x  *  exp(t)  has  the  G(0x,m)  distribution  function 
with  0  unknown  and  perhaps  a  unknown.  On  the  basis  of  a  random 

sample  of  t's  giving  rise  to  x^ . xr,  using  the  transformation 

x  *  exp(t),  we  can  test  the  goodness-of-f it  of  the  gamma  model  with  0 
estimated  for  various  specified  values  of  m.  We  might  use  the  value 
of  m  giving  rise  to  the  least  significant  fit  as  an  estimate  of  the 
true  value  of  m.  Note  that  the  test  procedures  of  section  2.2  cannot 
deal  with  censored  observations. 

3. A.  Equality  of  Variances  in  ANOVA. 

2 

Suppose  the  variance  of  a  random  sample  y^>‘*'>yD  Is  s  - 

_  2  P 
E(y  -y)  /(p-1).  Given  k  such  independent  samples,  each  of  size  p. 
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2 

let  be  the  variance  of  sample  i,  estimating  the  population 

2 

variance  a^.  Assuming  that  the  samples  are  normally  distributed 

2  2 

then  the  null  hypothesis  of  homogeneity  HQ1:  -  •••  »  can  be 

tested  using  the  procedure  of  section  2.2 

2 

(a)  Put  the  s^  in  ascending  order. 

(b)  Calculate  -  (p-l)ks^/(2Es^) . 

(c)  Follow  steps  (c,d,e)  In  section  2.1,  with  m  -  (p-l)/2,  and 
with  n  »  lc. 

2  2 

This  adaptation  follows  easily  from  the  fact  that  (p-l)Sj/(2a  ) 

are,  on  HQ^,  independently  Gamma  distributed  with  m  ■  (p-l)/2. 

One  disadvantage  of  using  the  EOF  statistics,  in  this  context,  is 

that  they  suffer,  but  not  nearly  to  the  same  extent  as  the  traditional 

tests  for  homogeneity  of  variance  (by  these  we  mean  Bartlett's,  Hartley's 

and  Cochran's  tests,  see  Pearson  and  Hartley,  1966,  pp.  202-4)  from  being 

biased,  that  is  the  power  is  less  than  the  significance  level,  for 

alternatives  which  encompass  both  heterogeneity  of  variances  (H^^  not 

true)  and  non-normality  of  the  samples  y^,...,y^.  This  happens  when 

the  y's  come  from  short  tailed  distributions  ($2  £  2)  and  the 

2  2  2 

variances  satisfy  relationships  such  as  ■  •  •  •  “  °JL+1  *  " 

2  2  2  2  2 

a2i  ■  2a p  ct2£+i  ■  ■  o3^  ■  3aj,  where  1  is  a  divisor  of  k. 

These  results  for  the  EDF  statistics  and  the  traditional  statistics  were 

found  using  Monte  Carlo  methods,  and  were  discussed  in  the  earlier 

Technical  Reports.  However  the  results  for  Bartlett's  statistic,  M, 

2 

can  be  deduced  from  Box's  (1953)  result  which  showed  that  if  s  is  a 

sample  variance  based  on  v  degrees  of  freedom,  from  a  population  with 

2 

Kurtosls  $2»  then  s  is  distributed  in  large  samples  as  a  normal  sample 
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variance  based  on  v6  degrees  of  freedom,  where 

6  -  2(B2-1)“1  , 

when  v  -*■  00  and  k  Is  kept  constant.  Box  showed  that,  for  homogeneous 

-1  2 

samples,  Bartlett's  M  is  asymptotically  distributed  as  a  6  xk  ^ 
random  variable. 

We  therefore  suggest  that  the  joint  normality  of  the  samples  is 
assessed  using  other  goodness-of-fit  procedures,  such  as  those  of  Wilk 
and  Shapiro  (1968)  or  Pettitt  (1977).  If  normality  is  accepted  then 
HQl  can  be  tested  using  the  EDF  statistics  or  Bartlett's  statistic, 
which,  of  course,  has  the  optimal  property  that  it  is  equivalent  to  the 
likelihood  ratio  test  of  against  not  true. 

3.5.  Test  for  the  Pirichlet  Distribution. 

Using  the  theory  of  Wilks  (1962,  §7.7.1)  it  follows  that  the  z^/(mm) 

(1*1,..., n)  behave  like  the  order  statistics  of  an  (n-l)-variate 

n*l 

Dirichlet  random  variable  v  (v  *1-E,  v.)  with  distribution 

~  n  1  i 

D(m, . . .  ,m;m) .  Thus  the  procedure  of  section  2.2  can  be  used  to  test 

the  goodness-of-f it  of  an  observation  on  v  to  the  Dirichlet  distribution. 

The  applications  which  have  been  discussed  cover  a  range  of  uses  of 
the  Gamma  distribution  in  statistical  work.  In  most  cases,  a  test  of  Hq 
would  probably  only  be  one  part  of  an  overall  analysis  of  the  data,  and 
would  be  complementary  to  other  analyses  and  tests;  previous  experience 
with  EDF  tests  suggests  that  they  would  be  useful  In  this  general  context, 
and  it  is  hoped  to  examine  their  efficacy  in  later  work. 


TABLE  1 


UPPER  TAIL  PERCENTAGE  POINTS  OF  THE 

ASYMPTOTIC  DISTRIBUTIONS  OF  W2,  U2  and  A2. 

Parameter  m  is  the  shape  parameter  in  Equation 

2 

(1).  If  the  test  is  for  a  X  p  distribution,  p  =  2m. 


Statistic 


W2 


u2 


A 


2 


m 

10% 

5% 

2.5% 

1% 

1 

.175 

.222 

.270 

.338 

2 

.156 

.195 

.234 

.288 

3 

.149 

.185 

.222 

.271 

4 

.146 

.180 

.215 

.262 

5 

.144 

.177 

.211 

.257 

6 

.142 

.175 

.209 

.254 

8 

.140 

.173 

.205 

.250 

10 

.139 

.171 

.204 

.247 

12 

.138 

.170 

.202 

.245 

15 

.138 

.169 

.201 

.244 

20 

.137 

.169 

.200 

.243 

00 

.135 

.165 

.196 

.237 

1 

.129 

.159 

.189 

.230 

2 

.129 

.158 

.188 

.228 

3 

.128 

.158 

.187 

.227 

4 

.128 

.158 

.187 

.227 

5 

.128 

.158 

.187 

.227 

6 

.128 

.157 

.187 

.227 

8 

.128 

.157 

.187 

.227 

10 

.128 

.157 

.187 

.227 

12 

.128 

.157 

.187 

.227 

15 

.128 

.157 

.187 

.227 

20 

.128 

.157 

.187 

.227 

00 

.128 

.157 

.187 

.227 

1 

1.062 

1.321 

1.591 

1.959 

2 

.989 

1.213 

1.441 

1.751 

3 

.959 

1.172 

1.389 

1.683 

4 

.944 

1.151 

1.362 

1.648 

5 

.935 

1.139 

1.346 

1.627 

6 

.928 

1.130 

1.335 

1.612 

8 

.919 

1.120 

1.322 

1.595 

10 

.915 

1.113 

1.314 

1.583 

12 

.911 

1.110 

1.310 

1.578 

15 

.908 

1.106 

1.304 

1.570 

20 

.905 

1.101 

1.298 

1.562 

00 

.893 

1,087 

1.281 

1.551 
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